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This document is a Release Specification of the RDMA Consortium. ^ 

Copies of this document and associated errata may be found at ^~ 
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This document describes an abstract interface to a RDMA enabled NIC ^ 

(RNIC) . This interface is implemented as a combination of the RNIC, j?^ 

its associated firmware, and host software. It provides access to 

the RNIC queuing and memory management resources, as well as the ^ 

underlying networking layers. 2 ^ 
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1 

Block List - A list of physical addresses describing a set of memory 2 
blocks, which specifies the block size, list of physical 3 
addresses, and offset to the start of the memory region of the 4 
first block. Each block has the same length and that length can 5 
be any value in the range supported by the RNIC. Each block may 6 
start at a byte granularity address. The starting address for 7 
the entire list may be an offset into the first block and the 8 
entire list may have any length. 9 

10 

Complete (Completed, Completion, Completes) - When the Consumer can 11 
determine that a particular RDMA Operation has performed all 12 
functions specified for the RDMA Operation, including Placement 13 
and Delivery. This can be determined through a Work Completion 14 
for Signaled Work Requests. For Unsignaled Work Requests, this 15 
means that the Completion Rules have been met. Note that this is 16 
a superset of the [RDMAP] definition for RDMA Completion. 17 

18 

Completion Error - A Processing Error reported through the 19 
Completion Queue. 20 

21 

Completion Queue (CQ) - A sharable queue containing one or more 22 
entries which can contain Completion Queue Entries. A CQ is used 23 
to create a single point of completion notification for multiple 24 
Work Queues. The Work Queues associated with a Completion Queue 25 
may be from different QPs and of differing queue types (SQs or 26 
RQs). 27 

28 

Completion Queue Entry (CQE) - The RNIC Interface internal 29 
representation of a Work Completion. 30 

31 

Completion Status - The resultant status of a Work Request returned 32 
as part of a Work Completion. 33 

34 

Consumer, Verbs Consumer - A software process that communicates 35 
using RDMA/DDP Verbs. The Consumer typically consists of an 36 
application program, or an operating system adaptation layer, 37 
which provides some OS specific API. 38 

39 

Direct Data Placement Protocol (DDP) - A wire protocol that supports 40 
Direct Data Placement by associating explicit memory buffer 41 
placement information with the LLP payload units. 42 

43 

Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined 44 
as the process of informing the ULP or Consumer that a 45 
particular Message is available for use. This is specifically 4 6 

different from Data Placement, which may generally occur in any 47 
order, while the order of Data Delivery is strictly defined. 4 8 



49 

Data Placement (Placement, Placed, Places) - A mechanism whereby ULP 50 
data contained within RDMA/DDP Segments may be put directly into 51 
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1 

Remote Peer - The RDMA protocol implementation on the opposite end 2 
of the connection. Used to refer to the remote entity when 3 
describing protocol exchanges or other interactions between two 4 
Nodes . 5 

6 

Remote RDMA Read Operation - a sequence of events that begins upon 7 
receipt of an incoming RDMA Read Request by the RI and stays in- 8 
process until the corresponding RDMA Read Response Message has 9 
been generated. This includes posting the RDMA Read Request to 10 
the Inbound RDMA Read Request Queue (See Section 6.5 - 11 
Outstanding RDMA Read Resource Management) . 12 

13 

RNIC Interface (RI) - The presentation of the RNIC to the Verbs 14 
Consumer as implemented through the combination of the RNIC and 15 
the RNIC device driver. 16 

17 

Scatter/Gather Element (SGE) - An individual entry in a 18 
Scatter/Gather List. Each SGE consists of an STag, Tagged Offset 19 
and Length. 20 

21 

Scatter/Gather List (SGL) - A List of Scatter/Gather Elements. The 22 
list describes one or more ULP Buffers which will have their 23 
data gathered on transmission or scattered upon reception. 24 

25 

Send - An RDMA Operation that transfers the contents of an Untagged 26 
buffer from the Local Peer to an Untagged buffer at the Remote 27 
Peer. 28 

29 

Send Operation Types - The set of Send operations that result in the 30 
consumption of a Receive Queue Work Request at the Data Sink. 31 
Specifically this includes Send, Send with Invalidate, Send with 32 
Solicited Event and Send with Solicited Event & Invalidate.' 33 

34 

Send Queue (SQ) - One of the two Work Queues associated with a Queue 35 
Pair. The Send Queue contains PostSQ Work Queue Elements that 36 
have specific operation types, such as Send Type, RDMA Write, or 37 
RDMA Read Type Operations, as well as STag operations such as 38 
Bind and Invalidate. 39 

40 

Shared Memory Region - An MR that currently shares,* or at one time 41 
shared, the Physical Buffer List associated with the Memory 42 
Region. Specifically, the PBL is currently shared or was 43 
previously shared with another Memory Region. 4 4 

45 

Shared Receive Queue - An optional mechanism which allows the 4 6 

Receive Queues from multiple QPs to retrieve Receive Queue Work 47 
Queue Elements from the same shared queue as needed. 48 

49 

Signaled - A WR which requires that the RNIC generate a Work 50 
Completion. 51 
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1 

Solicited Event (SE) - A facility by which an RDMA Operation sender 2 
may cause an Event to be generated at the recipient, if the 3 
recipient is configured to generate such an Event, when a Send 4 
with Solicited Event or Send with Solicited Event & Invalidate 5 
Message is received. 6 

7 

Steering Tag (STag) - An identifier of a Memory Window or Memory 8 
Region. STags are composed of two components: an STag Index and 9 
an STag Key. The Consumer forms the STag by combining the STag 10 
Index with the STag- Key. This specification further refines the 11 
definitions of STags contained in [RDMAP] and [DDP] . 12 

13 

STag Key - The least significant 8 bit portion of an STag. This 14 
field of an STag can be set to any value by the Consumer when 15 
performing a Memory Registration operation, such as Bind Memory 16 
Window, Fast-Register Memory Region and Register Memory Region. 17 

18 

STag Index - The most significant 24 bits of an STag. This field of 19 
the STag is managed by the RI and is treated as an opaque object 20 
by the Consumer. 21 

22 

Tagged Buffer - A buffer that can be Advertised to a Remote Peer 23 
through exchange of an STag, Tagged Offset, and length. 24 

25 

Tagged Offset (TO) - The offset within a Tagged Buffer. 26 

27 

Terminate - An RDMA Message used by a Node to pass an error 28 
indication to the Remote Peer on an RDMA Stream. 29 

30 

Upper Layer Protocol (ULP) - The protocol layer above the Verb 31 
layer. An example is SDP- 32 

33 

ULP Buffer - A buffer owned above the RI that can be represented 34 
within the RNIC, in whole or in part, by a Memory Window or a 35 
Memory Region. 36 

37 

ULP Message - The ULP data that is handed to a specific protocol 38 
layer for transmission. Data boundaries are preserved as they 39 
are transmitted through iWARP. 40 

41 

ULP Payload - The portion of a ULP Message that is contained within 42 
a single protocol segment or packet (e.g. a DDP Segment). 43 

44 

Unaffiliated Asynchronous Event - This is an indication from the 45 
Verb layer to the Consumer that an event has occurred unrelated 4 6 
to any single identifiable RNIC Resource. 47 

48 

Unsignaled - A Work Request which only generates a Work Completion 49 
if it encounters an error during processing. 50 

51 
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1 

which help a Consumer to efficiently notice when WRs have completed 2 
processing in the RI . There may be thousands of CQs per RNIC. 3 

4 

Event Handlers provide the mechanism for Consumers to be notified of 5 
Asynchronous Events which occur within the RI but which cannot be 6 
reported through the Completion Queues due to their asynchronous 7 
nature or the fact that they are not easily associated with a Work 8 
Completion. 9 

10 

5.1 The RNIC 11 

12 

Consumers gain access to an RNIC through the RNIC Interface. The 13 
Verbs allow the Consumer to open the RNIC, retrieve RNIC attributes, 14 
and close the RNIC. 15 

16 

All resources MUST be in the scope of the RNIC on which they are 17 
created. This means that there is no requirement for resources on 18 
one RNIC to be available, associated with or meaningful to another 19 
RNIC, even if they are managed by the same RNIC driver. This 20 
includes all QPs, STags, PDs, CQs, and multiple Completion Event 21 
Handlers. This also means that any IDs which are created by the RI 22 
are specific to that RNIC and are not guaranteed to be unique across 23 
all RNICs. 24 

25 

An intent of the architecture is to allow an implementation to pass 26 
Work Requests and Work Completions to and from a Non-Privileged Mode 27 
Consumer process directly to and from the RNIC. Another intent of 28 
the architecture is to optimize for a Privileged Mode 29 
implementation, which shares the Work Request and Work Completion 30 
requirements of Non-Privileged Mode Consumers but has slightly 31 
different memory management requirements. 32 

33 

Because the architecture attempts to optimize for both Privileged 34 
Mode and Non-Privileged Mode Consumers, there are some Verbs and 35 
Verb modes which are not allowed to be executed by non-Privileged 36 
Mode Consumers. An example of this is the use of the STag of zero or 37 
the ability to do Fast-Register WRs. In addition, there are some 38 
operations that, while being allowed in kernel mode, are intended to 39 
be used by Non-Privileged mode applications. An example of this is 40 
Memory Windows. Any restrictions are clearly specified in this 41 
document where required. 42 

43 

5.1.1 RNIC Resources 44 

45 

RNIC Resources can be allocated from a variety of places. They can 4 6 
be allocated in host memory on behalf of the Consumer or allocated 47 
within the RNIC. Where an RNIC allocates resources is implementation 48 
specific. Consequently, values that the RNIC returns as output 49 
modifiers when Querying the RNIC indicate the maximum amount of any 50 

51 
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8 least significant bits of the STag. The STag Index is the 24 most 
significant bits of the STag. 

The 8 bit STag Key is provided by the Consumer. The Consumer can use 
the STag Key in any way it desires. For example, it can be used as 
an incrementing value to help discover application errors by using a 
different value with each registration. As a general rule, the 
Consumer provides the STag Key to the RI whenever the consumer 
causes the transition of an STag to the Valid state, or when the 
STag is being Invalidated. In the Invalid state, only the STag Index 
is meaningful. 

There is no default value for the STag Key. The RI MUST use the STag 
Key provided by the Consumer for the following Verbs: 

* Register Non-Shared Memory Region, 

* Register Shared Memory Region, 

* Reregister Non-Shared Memory Region, 

* PostSQ Verb Fast-Register Non-Shared Memory Region operation, 
and 

* PostSQ Verb Bind operation, 

* PostSQ Invalidate Local STag. 

The RI MUST return the value of the STag Index sub-field on an 
invocation of the following: 

* Allocate Non-Shared Memory Region STag, 

* Allocate Memory Window, 

* Register Non-Shared Memory Region, 

* Register Shared Memory Region, and 

* Reregister Non-Shared Memory Region. 

The RI MUST use the same STag Index sub-field as was passed in by 

the Consumer, on an invocation of the following: 

* Query Memory Region, 

* Query Memory Window, 

* Register Shared Memory Region, 
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1 

* The memory access as specified by the TO & length is within the 2 
base and bounds of the Memory Region. The RI MUST enforce this 3 
with a byte level granularity. 4 

5 

If the length of the access is zero, the RI MUST NOT perform any of 6 
the above checks on the Memory Region. 7 

8 

7.7 Querying Memory Regions 9 

10 

Memory Regions have attributes that, can be retrieved through the 11 
Query Memory Region Verb. The RI MUST support the complete list of 12 
QP attributes as described in Section 9.2.6.3 - Query Memory Region. 13 

14 

7.8 Invalidating Memory Regions 15 

16 

When access to a Non-Shared Memory Region by an RI is no longer 17 
required, but the Consumer wants to retain the STag for use in 18 
future Fast-Register Non-Shared Memory Region and RI-Reregister Non- 19 
Shared Memory Region Verb invocations, the Consumer may directly 20 
invalidate access to the Non-Shared Memory Region through an 21 
Invalidate Local STag WR or an RDMA Read with Invalidate Local STag 22 
WR. Additionally , an STag may be invalidated by a remote Consumer 23 
through the use of a Send with Invalidate Message or a Send with 24 
Solicited Event and Invalidate Message. 25 

26 

Multiple Memory Regions can represent memory locations that have 27 
been registered multiple times. The invalidation of a single STag 28 
prevents RNIC access to those memory locations via the STag 29 
associated with that Memory Region. Access to the memory locations 30 
via STags associated with other Memory Regions other than the STag 31 
being Invalidated MUST NOT be affected. Invalidating an STag 32 
associated with a Memory Region that partially or completely overlap 33 
other Memory Regions MUST NOT cause the RI to affect the 34 
registration of those other Memory Regions. 35 

36 

The requirements for unpinning the physical buffers associated with 37 
deallocated Memory Regions are covered in Section 7.6.2 Physical 38 
Buffer Lists. 39 

40 

Invalidating an STag associated with a Shared Memory Region MUST 41 
result in an Completion Error. Consequently, using an STag 42 
associated with a Shared Memory Region under the following 43 
conditions will cause a Completion Error at the Data Sink that 44 
results in the LLP Stream being torn down after the data transfer 45 
operation takes place: 46 

47 

* As the STag specified in an Invalidate Local STag WR. 48 

49 

* As the Data Sink STag for an RDMA Read with Invalidate Local 50 
STag WR. 51 
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An RDMA Protocol Specification (Version 1.0) 

1 Status of this Memo 

This document is a Release Specification of the RDMA Consortium. 
Copies of this document and associated errata may be found at 
http : / /www . rdmaconsortium . org . 

2 Abstract 

This document defines a Remote Direct Memory Access Protocol (RDMAP) 
that operates over the Direct Data Placement Protocol (DDP 
protocol) RDMAP provides read and write services directly to 
applications and enables data to be transferred directly into ULP 
Buffers without intermediate data copies. It also enables a kernel 
bypass implementation . 
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1 

2 
3 

The control information of RDMA Messages is included in DDP protocol 4 

defined header fields, with the following exceptions: 5 

6 

* The first octet reserved for ULP usage on all DDP Messages in the 7 
DDP Protocol (i.e. the RsvdULP Field) is used by RDMAP to carry 8 
the RDMA Message Opcode and the RDMAP version. This octet is 9 
known as the RDMAP Control Field in this specification. For Send 1 
with Invalidate and Send with Solicited Event and Invalidate, 11 
RDMAP uses the second through fifth octets provided by DDP on 12 
Untagged DDP Messages to carry the STag that will be Invalidated. 13 

* The RDMA Message length is passed by the RDMAP layer to the DDP 15 

layer on all outbound transfers. 16 
* 17 

* For RDMA Read Request Messages, the RDMA Read Message Size is 18 

included in the RDMA Read Request Header. 19 

20 

* The RDMA Message length is passed to the RDMAP Layer by the DDP 21 

layer on inbound Untagged Buffer transfers. 22 

23 

* Two RDMA Messages carry additional RDMAP headers. The RDMA Read 24 
Request carries the Data Sink and Data Source buffer 25 
descriptions, including buffer length. The Terminate carries 26 
additional information associated with the error that caused the 27 
Terminate. 

29 

6.1 RDMAP Control and Invalidate STag Field 30 

31 

The version of RDMAP defined by this specification uses all 8 bits 32 

of the RDMAP Control Field. The first octet reserved for ULP use in 33 

the DDP Protocol MUST be used by the RDMAP to carry the RDMAP 34 

Control Field. The ordering of the bits in the first octet MUST be 35 

as defined in Figure 3 DDP Control, RDMAP Control, and Invalidate 36 

STag Field For Send with Invalidate and Send with Solicited Event 37 

and Invalidate, the second through fifth octets of the DDP RsvdULP 38 

field MUST be used by RDMAP to carry the Invalidate STag. Figure 3 39 

DDP Control, RDMAP Control, and Invalidate STag Field depicts the 40 

format of the DDP Control and RDMAP Control fields. (Note: In Figure 41 

3 DDP Control, RDMAP Control, and Invalidate STag Field, the DDP 42 

Header is offset by 16 bits to accommodate the MPA header defined in 43 

[MPA] . The MPA header is only present if DDP is layered on top of 44 

MPA.) *f 
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